Virtual parallel computing and a search algorithm using matrix product states 
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We propose a form of parallel computing on classical computers that is based on matrix product 
states. The virtual parallelization is accomplished by evolving all possible results for multiple inputs, 
with bits represented by matrices. The action by classical probabilistic 1-bit and deterministic 
2-bit gates such as NAND are implemented in terms of matrix operations and, as opposed to 
quantum computing, it is possible to copy bits. We present a way to explore this method of 
computation to solve search problems and count the number of solutions. We argue that if the 
classical computational cost of testing solutions (witnesses) requires less than C(n^) local two-bit 
gates acting on n bits, the search problem can be fully solved in subexponential time. Therefore, 
for this restricted type of search problem, the virtual parallelization scheme is faster than Grover's 
quantum algorithm. 



Interference and the ability to follow many history 
paths simultaneously make quantum systems attractive 
for implementing computations 0]. Efficient algorithms 
exploring these properties have been proposed to solve 
practical problems such as number factoring Q and un- 
sorted database search [^. However, we still do not have 
a sufficiently large and resilient quantum computer to 
take advantage of these algorithms. It is thus very de- 
sirable to try to find better and more efficient ways to 
compute with classical systems. In this regard, recent 
advances in our understanding of quantum many-body 
systems provide some guidance. It is well understood 
now that the time evolution of a large class of one- 
dimensional interacting systems can be efficiently sim- 
ulated by expressing their wave functions in a matrix 
product state form and by using a time-evolving block 
decimation (TEBD) [Q. A key aspect of this success 
is data compression: Even though many-body interac- 
tions tend to increase the rank of the matrices over time, 
it is possible to use truncation along the evolution to 
keep the matrices relatively small, such that the result- 
ing wave function approximates quite accurately the ex- 
act one without an exponential computation cost . In 
quantum systems, it is well understood that local inter- 
actions do not quickly entangle one-dimensional many- 
body state, justifying the matrix truncation [|[ 

In this Letter, we describe a method of classical com- 
putation that utilizes matrix product states (MPS) to 
implement search and other similar tasks. Compression, 
when possible, provides additional speedup. Formally, 
instead of working with wave functions and quantum am- 
plitudes, we describe the state of the computer in terms 
of a stochastic probability distribution written as traces 
of matrix product states associated to bit configurations. 
The idea of expressing classical probability distributions 
in the form of MPS is not new |^ , but the focus so far has 
been on using it to study non-equilibrium phenomena of 
physical systems (see for instance Ref. ^. As we show 
below, an MPS formulation of classical probability distri- 



butions can also be employed to create a virtual parallel 
machine where all possible outcomes of an algorithm are 
obtained for all 2" inputs of an n-bit register. Informa- 
tion about these outcomes is encoded and compressed 
in the matrices forming the MPS. By itself this "paral- 
lelism" is not obviously useful; it is, however, if a certain 
problem can use the probability of a single outcome at 
a time. This is the case of a search problem that seeks, 
for a given y, the value of x such that y = f{x) for an 
algorithmically computable function /. Then, the focus 
is not on all values of the output, but on only one given y. 
We shall show below that in this case matrix computing 
can be useful. In particular, from the probability of y, 
the method directly provides the number of input values 
X satisfying the functional constraint y = f{x). 

In our matrix computing, insertion and removal of bits 
arc allowed and 1-bit and 2-bit gates can be implemented 
much like in a conventional computer. Our 1-bit gates 
are probabilistic while our 2-bit gates are deterministic. 
2-bit gates rely on a singular value decomposition (SVD) 
to maintain the MPS form of the probability distribution. 
All these operations preserve the positivity and the over- 
all normalization of the probability even though we work 
with non-positive matrices. (We find that even when ma- 
trices are truncated, no significant negative probabilities 
are produced for practical calculations.) 

Matrix computing formulation - Consider a set of bi- 
nary variables {xj = 0, l}j=i,...,n describing a set of n 
bits, with \xi X2 ■ ■ -Xn) = |a;) denoting a particular con- 
figuration of this system. In analogy to quantum me- 
chanics, we define the vector 



IP) = 



P{xi,...,Xn)\xi...Xn), (1) 



Xn , . . .,a;i— 0,1 

where 

P(xi,...,x„) = tr(Mf^--.M^) 
Here, each M^^ is a real matrix of dimensions D 



(2) 



The trace can be dropped if we consider the first and last 
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matrices to be row and column vectors, i.e., Dq = Dn = 
1. The state vector is normalized in the following sense: 
Define |S) = ^^^^^^ \xi . . . a;„), then Z = (S|P) = 

1 since ^i^) = 1- 

Starting from an initial probability distribution 
Po{xi, . . . ,Xn), the vector \P) evolves by the application 
of 1-bit and 2-bit gates, with the latter always acting on 
adjacent bits. 

• 1-bit gates: We will use probabilistic one-bit gates, 
which take states 0,1 to states 0, 1 with probabilities 
p,l — p and q,l — q: 

OJL^Q or 0^1 
llzi>0 or 1^1. 

The probabilities can be encoded into a transfer function 
i"'" that takes a logic input a = 0, 1 into a logic output 
a = 0, 1. Explicitly: = p, t^-" = I - p, t"'^ = 1 - g, 
t^'^ = q. A 1-bit gate acting on bit j yields a new matrix 



E 



(3) 



The transfer function satisfies the sum rule X]a=o i ^ 
1, which ensures that the normalization Z ~ 1 is main- 
tained as the system evolves. Examples of 1-bit gates 
arc: (a) Deterministic NOT, with p ~ and q = 0, (b) 
RAND, with p = 1/2 and q = 1/2, which randomizes the 
bit, (c) RST, with p — 1 and q — 0, which resets the bit 
to 0. 

• 2-bit gates: We will consider only deterministic two- 
bit gates. Given two logical functions A(a, b) and B(a, 6), 
we construct the transfer function T"-^'''^, taking bits with 
states a and b to bits with states d and 6, respectively: 



T 



db^ab 



1, a = A{a, b) and b = B{a, h), 
0, otherwise. 



(4) 



Similarly to 1-bit gates, the normalization after 2-bit 
gates is preserved by the sum rule j^^q ^ T° 
The evolved matrices must satisfy 



b.ab 



1. 



1 ,a;'^0,l 



(5) 

and we use the SVD to decompose the result of the gate 
operation on the right-hand side of Eq. (^) as a product 
of two matrices, as in the left-hand side of the equation, 
for all the four cases Xj-i,Xj = 0, 1. 

Let us demonstrate this construction with a concrete 
example. Consider the following logical operation on bits 
j — I and j: ^NAND(a,^) = a and SNAND(a,^) = aAb. 
The first bit is unaffected, while the second one evolves 
into the NAND operation between the two bits. In this 
case, TOi^o" = T01^"i = T"^i" = T^o^" = 1, with all 
other elements set to zero. We use the transfer function 



to determine the four blocks (for Xj-i,Xj = 0,1) of a 



matrix Mf^^f of dimension 2Dj_2 x 2Dj: 



M 



NAND 



(6) 



To factor the matrix A^i.i-i back into a product, we 
employ an SVD, 



M^.^-l 



SVD 



M? 



M} 



MU Mil 



(7) 



In this process, the common dimension Dj^i may change 
and likely increase. This is an issue of fundamental im- 
portant, which we shall return when we discuss a search 
algorithm. 

• Bit insertions and removals: For computational tasks 
such addition and multiplication, it is important to be 
able to insert and remove bits. These operations are 
straightforward for MPS. Insertion of a new bit (say, 
initially set to 0) in between bits j — 1 and j amounts 
to replacing Mj^i' M^' with M^ri' M^" M^' , where 
and A/^ arc x -Dj-i null and identity matrices, 

respectively, and the total sum over bit configurations 
in the vector \P) [see Eq. (|^)] has now to include 
the binary variable Xa = 0, 1. Removal of a bit is 
done by absorbing its matrix into the one of an adja- 
cent bit, namely, by tracing it out; for instance, we use 
Ex,=o,i M^'MJ^T = Mj^T to remove bit j. 

How can matrix computing can be used to solve cer- 
tain computational problems? - Here we shall present 
computational algorithms that explore the virtual paral- 
lelism encoded in matrix product states. To be concrete, 
consider the following search problem as an example: 

Given a function y = C{x) that can be com- 
puted algorithmically with 0{n'^) gates and a cer- 
tain value for y, we would like to search for an input 
X that yields as output y ~ f{x). 

The reason why matrix computation is useful for this 
search problem can be argued as follows. Matrix product 
states can express the probability values of all possible m- 
bit outputs y = yiy2 ■ ■ .ym if one starts with a product 
state encoding all possible n-bit inputs x = X1X2 ■ . .Xm 
namely, P{x) = 2^" for all x. Of course, if we were inter- 
ested in all the probabilities, we would have to compute 
an exponentially large (2™) number of traces of products 
of matrices. But this is not what is needed to perform 
the search above: We are interested in just one output y 
for this problem. We thus proceed in the following steps. 

1. Starting with all bits Xi, i — 1,...,77,, random- 
ized with equal probabilities 1/2 for being or 1, 



we compute the final output matrices M, 
1, . . . , m. 
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2. We compute the probability P{y) for the given y 
we are interested in. If P{y) > 2~", then there is 
at least one value of x such that y = f{x). 

3. We then fix one of the input bits, say xi, to be 
0, instead of randomizing it. Wc recompute the 
output matrices M^' , j = 1, . . . , m, and the new 
probability P{y). Again we test if P{y) > 2~". If 
the probability fell below the threshold, we must 
reset xi to 1. (Notice that since there may be more 
than one x for a given y, that P{y) stays above 
threshold does not mean that switching to = 1 is 
necessarily forbidden, but we shall stick instead to 
xi = in this case to avoid unnecessary iterations.) 

4. We repeat step 3 fixing now input bit X2, then re- 
peat it again fixing input bit 0:3, and so on until 
we finally fix input bit a;„. At the end of n steps, 
having fixed all the n bits of the input, we have 
arrived at one value for x such that y = f{x). 

Let us discuss the computational cost of such algo- 
rithm. To simplify the discussion, let us present it in 
terms of the largest matrix dimension D in the compu- 
tations, which we shall relate to the number Ug of gates 
involved in the computation of the function f{x). All 
SVD steps involve matrices with rank smaller or equal to 
D; therefore, the cost associate to gate operations is no 
more than 0{ng x D^). One has also to compute the trace 
of the matrix products for a fixed y to yield the probabil- 
ity P(y), and this takes time 0{nx D), which we discard 
in comparison with the SVD steps. We then have to re- 
peat the procedure fixing bit- by- bit the Xi, i = 1, . . . , n. 
Therefore, in the worst case it takes a time 0{nx Ug x D^) 
to find X. 

The largest computational cost comes from the SVD 
steps, which depends on the rank D of the matrices. 
The crucial issue is how D scales with either the number 
of bits n or the number of gates Ug for a given algo- 
rithm to compute f{x). We shall break the discussion 
below into two cases. The first one focuses on the case 
where all the singular values are kept and no approxi- 
mations are made. The purpose of this discussion is to 
show that, even without approximations, matrix comput- 
ing can yield search algorithms that perform faster than 
Grover's quantum algorithm depending on the complex- 
ity involved in computing the function /(x). The second 
case is more applied, and makes use of the Eckart- Young 
theorem to best approximate the matrices by others 
of rank Dcut < -D by keeping only the largest Dcut sin- 
gular values in the SVD steps. Of course the usefulness 
of computing with the truncated singular values depends 
on how compressible the partial answers are in each step 
of the computation. 

Computational costs without discarding singular val- 
ues - We shall prove below the following result: The 
maximum dimension of any matrix in a computation 



using Ug gates in a system with n bits is bounded by 
D < D^^^in.ng) = min (^2L\/2^J , 2L"/2J ^ The conse- 
quence of this result on the computational time is as fol- 
lows. As we argued above, the search algorithm takes a 
time 0{nxngX D^). For a function y = f{x) that can be 
computed with rig ~ nf^ gates, the time to search for an x 
that gives a fixed y has two different behaviors depending 
on whether d < 2 or d > 2. If d < 2, D^ax 2^""^', 
and thus the search takes, in the worst possible case, a 
time ©(n'^+i x 23^""''') using matrix computing algo- 
rithms. If instead d> 2, Dmax saturates to Umax ~ 2"/^ 
and in the worst possible case the computation (without 
discarding singular values) takes exponential time. In 
other words, there is a transition between subexponen- 
tial and exponential behavior at dc = 2. It thus follows 
that for any function f{x) that can be computed with 
Ug < 0{n'^) gates, the full search problem can be solved 
faster using matrix computing than using Grover's quan- 
tum algorithm, which scales as 0(2"/-^). 

Proof of the bound on the largest bond dimension - 
Upon application of a 2-bit gate on bits j — 1 and j, the 
dimension -Dj-i will increase as follows. Starting with 
Dj^2 X -Dj-i matrices Afj£^^ and -Dj-i x Dj matrices 

A/J^. one assembles a 2Dj^2 x 2Dj matrix M^'^i j [see 
the example of the NAND gate in Eq. (|)]. The SVD step 
will lead to x ^j-i matrices Mji^^ and Dj-i x Dj 

matrices AfJ^, where the new bond dimension Dj-i = 
min(2£'j_2, 2Z?j). It is useful to work on a logarithmic 
scale and define hj = log2 Dj. Thus we can write hj-i = 
min(/ij_2, hj) + 1. 

Let us next prove that at any step in the algorith- 
mic evolution the "entanglement heights" hj satisfy the 
condition \hj — hj-i\ < l,Vj, which we shall refer to 
as the height difference constraint (hdc). The proof is 
done by induction. At the initial state of the calculation, 
one starts with the product state of all possible equally 
weighted inputs x, which correspond to 1 x 1 matrices or, 
equivalently, all hj = 0, so that \hj — hj-i\ = < 1, thus 
satisfying the condition. Now suppose that the condi- 
tion is satisfied at step r; we can show that it is then 
also satisfied at step t 4- 1, when a 2-bit gate is ap- 
plied between two adjacent bits j — 1 and j. None of 
the heights other than hj^i — > /ij-i are changed, there- 
fore the hdc condition \hj — < 1 remains satisfied 
for all z < J — 1 and i > j, and it just remains to be 
shown that it is satisfied for i = j — 1 and i = j. Con- 
sider the case where hj-2 < hj (the other case hj < hj-2 
is analogous). In this case = hj-2 + 1, satisfy- 
ing the condition — hj^2\ < 1- Now hj — /ij-i = 
hj — ~ 1 = {hj — ^j-i) + [hj-i — hj-2) — 1, and 
using that hj — /ij-i < 1 and hj^i — hj-2 < 1, as well 
as that hj-2 < hj, wc have that \hj — hj-i\ < 1. It 
thus follows that the hdc condition \hj — hj-i \ < l,Vj is 
satisfied at all steps in the calculation. An example of a 
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FIG. 1. Example of a configuration of entanglement heights 
{hj = logj Dj ) satisfying the height difference constraint 
l^j ^ ^3-i-\ — when n = 12. The dashed line shows 

the configuration with maximum heights. 

configuration of entanglement heights satisfying the hdc 
is show in Fig. |l|. 

If all we do to evolve the state is to apply 2-bit gates, 
we have shown that \hj — hj_i \ < l,Vj. It is easy to see 
that after a bit insertion the condition is still satisfied, 
because the change in height is zero on the two sides of 
the inserted bit (corresponding to a square matrix) , with 
all other relative height differences unchanged. The re- 
moval (tracing out) of bits is slightly more subtle. Right 
after the removal, there are large jumps across the region 
where the bits were removed, but these can be brought 
up to satisfy the hdc by applying a series of 2-bit iden- 
tity gates [A{a, b) = a and B{a, h) = b] sweeping from 
left-to-right followed by another from right-to- left. These 
sweeps remove the height "faults" (and actually tend to 
decrease the overall height). Therefore we arrive at the 
result that the hdc condition is satisfied after all opera- 
tions, 2-bit gates, bit insertions, and bit deletions (after 
the identity sweeps). 

Let us now show that the maximum height resulting 
from the application of Ug 2-bit gates is bounded by 
^max < [■\/2 rigj . The application of a single 2-bit gate 
on bits J — 1 and j changes the height /ij-i — > /ij-i = 
min(/ij_2, hj) + 1. Because the relative heights of neigh- 
boring bonds cannot differ by more than 1 unit due to 
the hdc, the maximum amount that the height /ij-i can 
increase with respect to hj-i is by 2 (which occurs when 
hj-2 = hj = hj-i -t- 1). Therefore one can write that 
S = — Now, suppose that the maximum 

height is /imax at some bond labelled by imax (located 
to the right of bit imax); because the heights ho to the 
left of the 1st bit and /i„ to the right of the nth bit arc 
both equal to at all times, and because of the hdc con- 
dition, there are constraints on how quickly the heights 
can grow from to /imax at imax and then decrease down 
to again. The climb and descent that minimizes the 
area S can be trivially seen to be a triangle where hj 
increases linearly from j = imax — ^max to j = imax, and 
then decreases linearly until j = imax + ^max- The area of 
this triangle is S'min = /imax, and any other height profile 
that reaches the same maximum height /imax has larger 
or equal area. Therefore, /imax — ^ — ^^g, and thus we 
arrive at the conclusion that /imax < [■\/2 rigj , i.e., the 



bound on the maximum entanglement height for a given 
number of gates. Furthermore, because of the hdc and 
the fact that /ig = /i„ = 0, the entanglement height for a 
fixed j is bounded by hj < min(j, n — j), and the overall 
maximum /imax = L'^/2J is reached at the center of the 
chain, j = [n/2j and j = \n/2~\ (which coincide when n 
is even). 

Putting all the conditions together, we arrive at 
/imax < niin ([^2 rigj , [n/2j), or equivalently, the bound 

D < i:'max(n, rig) = min (2^V^i ^ 2L"/2J ^ which we used 
to obtain the absolute maximum running time of the 
search algorithm. 

When the ranks of the matrices do not grow as fast 
as in the worst case scenario discussed above, the cal- 
culations should run even faster. Another possibility for 
speed up is to keep only a subset of the singular values 
in the decompositions. 

Faster computations by keeping most relevant singular 
values? - Truncating the rank of the matrices by se- 
lecting only a subset of singular values coming from the 
SVD steps is standard procedure in quantum methods 
such as the TEBD, and the classical version for stochas- 
tic evolution, the cTEBD. Carrying out logic computa- 
tions in the way we present above can be regarded as a 
form of stochastic evolution, and therefore the analysis of 
stochastic evolution with cTEBD carried out in Ref. |^ 
applies. 

What criteria does a calculation must fulfill so as to 
be efficiently performed (in polynomial time in n) us- 
ing matrix computation? To address this question, let 
us start by presenting a necessary condition: The final 
state of the calculation P{y) must be compressible and 
thus writable as a matrix product state with matrices of 
dimension I?cut scaling as n", for some exponent a > 0. 
The condition is necessary but not sufficient because it 
is possible that the actual computation has intermediate 
steps with "entropic" barriers and those depend on the 
specific algorithm (including how cleverly it can be writ- 
ten) to compute f{x). However, focusing on the final 
state P{y) is useful because it allows us to investigate 
the applicability of the method, say by analyzing the be- 
havior and scaling for small n first, even before designing 
the sequence of gates that implements the algorithmic 
calculation of y = f{^)- For instance, one can compute a 
measure such as the entropy cost associated to partition- 
ing P{y), which plays a similar role to the entanglement 
entropy in a quantum matrix product state and is lower 
bounded by the mutual information [ p^ . 

Conclusions - We have shown that is is possible to 
achieve virtual parallelization in single-processor classi- 
cal computers using 1-bit and 2-bit local gates acting on 
matrix product states. We propose a search algorithm 
based on this method that is faster than Grover's quan- 
tum search algorithm when the cost to check a witness 
requires less than 0{n'^) 2-bit gates. Additional speedup 
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is possible in particular cases when either the rank of the 
matrices involved in the product state grows slowly as 
the computation progresses or the rank can be reduced 
by truncation during gate operations. The method is not 
limited to one-dimensional bit arrays and could in princi- 
ple be extended to higher dimension tensor products. Fi- 
nally, we point out that this method also naturally counts 
the number of satisfying assignments of a given Boolean 
formula, which is a problem of much importance in Com- 
puter Science. 
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